Workload Based Device Access

ABSTRACT

Technologies are described to perform workload based device access. An input-output (IO) request received from an application. An application profile for the application is determined. Based on the application profile, one or more IO parameter values to access a device are set. The device is accessed based on the one or more IO parameter values to fulfill the IO request.

This application is a continuation of U.S. patent application Ser. No.16/579,771, filed Sep. 23, 2019, entitled, “Workload based deviceaccess,” which claims priority to U.S. Provisional Patent ApplicationNo. 62/735,330, filed Sep. 24, 2018, entitled, “Workload based deviceaccess” and is related to U.S. provisional application Ser. No.62/651,995 filed on Apr. 3, 2018, and entitled, “Workload based storageoptimization”, which are incorporated by reference in their entirety.

BACKGROUND

Modern hardware configurations that provide access to storage, computeservers, network bandwidth, etc. are optimized for local performanceimprovements.

For example, storage may be configured with multiple storage devices,e.g., in a redundant fallback configuration, to provide reliability. Inanother example, storage may be configured with additional layers, suchas one or more caches, to provide an improved speed of access. Inanother example, an abstraction of storage, e.g., via an applicationprogram interface (API) call, may be provided to an application, withphysical storage devices configured to process requests received via theAPI. In each of these examples, an application that accesses storage maybe unaware of the actual storage configuration, performance parameters,reliability, etc. Further, as the demand from an application changes,storage configurations need to be updated to provide acceptable levelsof performance.

In these configurations, optimizations are local to the storageconfiguration. For example, storage configurations may be provided withadditional caches, e.g., if it is detected that quality of service (QoS)parameters of the speed of access for an application are not being met.In another example, meeting reliability QoS may be achieved by addingmultiple redundant storage devices, e.g., hard drives, solid statedevices (SSDs), etc. such that the failure of individual devices doesnot cause loss of data. Further, techniques such as error-detection anderror-correction codes may also be implemented.

While such configurations may provide applications with storage thatmeets QoS, the configurations are expensive, e.g., due to additionalhardware requirements (e.g., caches, redundant storage devices, etc.).Further, these configurations fail to provide predictable performance toan application. For example, when storage requests from an applicationexperience a greater rate of cache hits, the application may experiencebetter mean QoS than when the rate of cache hits is lower, butcorrespondingly a much larger tail latency when there is a cache miss.In another example, an application may experience different performancewhen the storage request accesses different hardware, e.g., an SSD witha higher bit-error rate (e.g., due to aging) may be slower than anotherSSD with a lower bit-error rate.

Many modern systems are implemented in a multi-tenant configuration. Forexample, virtualization technology enables multiple softwareapplications to share the same physical compute hardware, access thesame physical storage devices, and exchange data over the same physicalnetwork equipment.

While multi-tenancy offers several benefits, it can lead to greaterunpredictability in performance. For example, if multiple applicationsattempt to access the same resource (e.g., storage drive, processor,etc.) at the same time, one or more of the applications may experiencelower performance, than when such requests are made at different times.Overprovisioning is one strategy to provide predictable performance;however, overprovisioning is expensive.

This disclosure was conceived in light of some of these problems.

The background description provided herein is for the purpose ofgenerally presenting the context of the disclosure. Work of thepresently named inventors, to the extent it is described in thisbackground section, as well as aspects of the description that may nototherwise qualify as prior art at the time of filing, are neitherexpressly nor impliedly admitted as prior art against the presentdisclosure.

SUMMARY

Embodiments generally relate to a computer-implemented method to fulfillan input-output (IO) request from an application. The method comprisesreceiving an input-output (IO) request from an application. The methodfurther comprises determining an application profile for theapplication. The method further comprises setting one or more IOparameter values to access a device based at least in part on theapplication profile. The method further comprises accessing the devicebased on the one or more IO parameter values to fulfill the request.

In some embodiments, the method further includes determining anapplication type of the application based on a configuration settingprior to receiving the IO request, and in response to detecting that theapplication has launched. In some embodiments, the application type isdetermined as unknown, and setting the one or more IO parameters isbased on a default template that includes default values for the one ormore IO parameters.

In some embodiments, receiving the IO request comprises receiving atleast one of the one or more IO parameter values from the application.In some embodiments, the one or more IO parameter values include valuesof one or more of a cache type parameter, a read-buffer parameter, awrite-buffer parameter, a queue parameter, a journaling parameter, amapping parameter, an error tolerance parameter, an access-typeoptimization parameter, or a storage-container parameter.

In some embodiments, the method further comprises determining, based onthe application profile, that the application is tolerant of errors, andin response to determining that the application is tolerant of errors,setting the error tolerance parameter to a high value. In someembodiments, the application profile is based on one or more of anapplication type of the application determined based on an applicationidentifier, a network port associated with the application, a sourcelanguage for application code, an application execution environment inwhich the application executes, or application program code. In someembodiments, the application program code includes bytecode, compiledcode, or source code.

In some embodiments, determining the application profile is based on atleast in part on a plurality of prior IO requests from the application.In some embodiments, the method further comprises analyzing theplurality of prior IO requests to determine a respective proportion ofcreate, read, update, and delete (CRUD) operations in the plurality ofprior IO requests. In some embodiments, the method further comprisesanalyzing the plurality of prior IO requests to determine a proportionof IO requests that result in a cache invalidation or a cache miss. Insome embodiments, the method further comprises determining a rate of IOrequest arrival based on the plurality of prior IO requests. In someembodiments, the method further comprises analyzing a size in bits ofthe plurality of prior IO requests, and based on the size of theplurality of prior IO requests, determining a bandwidth used by theapplication.

In some embodiments, the method further comprises analyzing a size inbits of the plurality of prior IO requests to determine one or more of:an average size, a median size, a maximum size, a minimum size, or afrequency distribution of the size. In some embodiments, the frequencydistribution is normal distribution, wherein the method furthercomprises allocating a buffer for the application, and wherein a size ofthe buffer is within three sigma of a mean of the frequencydistribution.

In some embodiments, determining the application profile based on theplurality of prior IO requests comprises grouping the one or more priorIO requests into one of more request groups based on a time of arrivalof each of the plurality of prior IO requests, wherein each requestgroup is associated with a respective sampling period and determining avalue of a particular characteristic of the plurality of prior IOrequests in each request group. In some embodiments, the method furthercomprises assigning a respective weight to each request group prior todetermining the value of the particular characteristic. In someembodiments, the weights are assigned such a first request groupassociated with a recent sampling period is assigned a higher weightthan a second request group associated with an earlier sampling period.

In some embodiments, each of the plurality of prior IO requests has atime of arrival within N seconds of receiving the IO request, andwherein N is an integer.

In some embodiments, the method further comprises allocating a bufferfor the application, wherein a size of the buffer is determined based onthe one or more prior IO requests from the application. In someembodiments, the size of the buffer is based on a respective proportionof each type of IO operation in the plurality of prior IO requests, andwherein the type is create, read, update, or delete (CRUD). In someembodiments, the buffer includes a respective sub-buffer for each typeof request, and wherein the size of each respective sub-buffer is basedon the proportion of the respective type of IO operation. In someembodiments, the one or more prior IO requests are requests to read datafrom the device, and wherein the size of the read-buffer is based on asize of data read for each of the one or more IO requests.

In some embodiments, the IO request includes a plurality of IOoperations, and the method further comprises determining a respectivesize of the plurality of input or output operations and grouping theplurality of input or output operations into one or more groups, whereina combined size of the operations in each group that is less than orequal to the size of the buffer. In some embodiments, the method furthercomprises determining that a size of an input or output operationspecified in the IO request is larger than the size of the buffer andsplitting the input or output operation into a plurality ofsub-operations, wherein each sub-operation has a respective size that isthat is less than or equal to the size of the buffer.

In some embodiments the device includes a storage device, and theapplication profile includes one or more of a storage capacityrequirement, a storage bandwidth requirement, a storage access type, anda storage block size. In some implementations in which the deviceincludes a storage device, the method further comprises determining oneor more hardware characteristics of the storage device. In theseembodiments accessing the device comprises accessing the storage devicebased on the one or more hardware characteristics. In some embodiments,the one or more hardware characteristics include a physical type ofstorage units in the storage device, a block size configured for thestorage device, one or more configuration parameters of the storagedevice, or a size of the storage device.

Embodiments also relate to a computer-implemented method to access adevice to fulfill an IO request from an application. The method includesreceiving an IO request from an application. The method further includesdetermining an application profile for the application. The methodfurther includes allocating a buffer for the application, wherein a sizeof the buffer is determined based on one or more prior IO requests fromthe application. The method further includes accessing a device tofulfill the request using the buffer to store application data.

In some embodiments, the size of the buffer is based on a respectivetype and proportion of each IO operation in the one or more prior IOrequests, and wherein the respective type is one of create, read,update, or delete (CRUD). In some embodiments, the buffer is partitionedinto respective sub-buffers for each of CRUD, and wherein the size ofthe sub-buffer is based on the respective type.

In some embodiments in which the IO request includes a plurality of IOoperations, the method further comprises determining a respective sizeof the plurality of IO operations and grouping the plurality of input oroutput operations into one or more groups, wherein each group has acombined size that is less than or equal to the size of the buffer. Insome embodiments in which the IO request includes a single input oroutput operation, the method further comprises determining that a sizeof the input or output operation is larger than the size of the bufferand splitting the IO operation into a plurality of input or outputsub-operations, wherein each sub-operation has a respective size that isthat is less than or equal to the size of the buffer.

Embodiments also relate to a computer-implemented method to transferdata between a storage device and a software application that executesin an application execution environment. The method comprises receivinga data transfer request from the software application. The methodfurther comprises identifying the storage device from a plurality ofstorage devices based on the data transfer request. The method furthercomprises sending a command to the storage device directly from thesoftware application to the storage device. The method further comprisesreceiving a response to the command. The method further comprisesproviding the response to the software application.

In some embodiments, the data transfer request is to write data to thestorage device. In some embodiments, the data comprises one or more dataunits, and the command specifies a respective physical address withinone or more individual storage units of the storage device for the oneor more data units. In some embodiments, the method further comprisessending the data to the storage device. In some embodiments, sending thedata to the storage device is performed without a context switch fromthe software application to an operating system.

In some embodiments, the command is sent without a context switch fromthe software application to an operating system. In some embodiments thedata transfer request is to read data from the storage device,

and specifies a memory address within user space memory allocated to thesoftware application. In some embodiments, receiving the responsecomprises receiving the data directly from the storage device without acontext switch to an operating system on which the software applicationexecutes, and the method further comprises writing the data directly tothe user space memory allocated to the software application, based onthe memory address.

In some embodiments, the user space memory is allocated to the softwareapplication, and providing the response to the software applicationcomprises providing a pointer to the data written to the user spacememory.

In some embodiments, the method is implemented in a software driver thatexecutes within the application execution environment.

In some embodiments, the data transfer request is to read data from thestorage device, receiving the response to the command comprisesreceiving the data, and providing the response to the softwareapplication comprises writing the data directly to a user space memoryallocated to the software application, and after writing the data,providing a pointer to a memory address within the user space memorywhere the data is written.

In some embodiments, the method is implemented in a software driver thatexecutes within the application execution environment, wherein at leasta portion of the user space memory allocated to the software applicationis shared between the software application and the software driver, andwherein the memory address at which the data is written is within theportion of the user space memory.

In some embodiments, the method is implemented in a software driver thatexecutes within the application execution environment and has access toat least a portion of user space memory allocated to the softwareapplication, wherein the data transfer request is to write data to thestorage device and includes a pointer to a memory address within theuser space memory, and wherein sending the command comprises reading thedata directly from the portion of the user space memory based on thepointer, and sending the data to the storage device.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1 is a block diagram of an example network environment 100 whichmay be used for one or more implementations described herein.

FIG. 2 is a flow diagram illustrating one example of a method 200 toaccess a device to fulfill an input-output (IO) request, according tosome implementations.

FIG. 3A illustrates a block diagram of an example computing device 300which may be used for one or more implementations described herein.

FIG. 3B illustrates a block diagram of the example computing device 300which may be used for one or more implementations described herein.

FIG. 4 illustrates an example method 400 for data transfer between asoftware application and a storage device, according to someimplementations.

FIG. 5 illustrates a block diagram of an example environment 500 whichmay be used for one or more implementations described herein.

DETAILED DESCRIPTION

In the following detailed description, reference is made to theaccompanying drawings, which form a part hereof. In the drawings,similar symbols typically identify similar components, unless contextdictates otherwise. The illustrative embodiments described in thedetailed description, drawings, and claims are not meant to be limiting.Other embodiments may be utilized, and other changes may be made,without departing from the spirit or scope of the subject matterpresented herein. The aspects of the present disclosure, as generallydescribed herein, and illustrated in the Figures, can be arranged,substituted, combined, separated, and designed in a wide variety ofdifferent configurations, all of which are contemplated herein.

FIG. 1 illustrates a block diagram of an example network environment100, which may be used in some implementations described herein. In someimplementations, network environment 100 includes one or more serversystems, e.g., server system 102. Server system 102 (and other serversystems in network environment 100) can communicate with each other,with one or more direct attached storage devices, e.g., storagedevice(s) 170, with networked storage devices 160, 162, and 164, andwith other systems, e.g., database systems, client devices, storagedevices, etc.) over network 150.

Server system 102 can include one or more server devices. For example,server system 102 may be a single server, e.g., with a single mainprocessing board (motherboard) and one or more processors. In anotherexample, server system 102 may include a plurality of servers (e.g.,server devices 104 and 106), e.g., arranged in a server rack, inmultiple server racks in a data center, in multiple data centers, etc.In this example, the plurality of servers are configured to communicatewith each other via various mechanisms, e.g. over network 150.

A server device (e.g., server device 104, 106) in a server system may beconfigured to provide one or more application execution environments,e.g., software environments for execution of one or more softwareapplications. A server device may include hardware that supportsexecution of software applications, e.g., one or more processors such asa central processing unit (CPU), graphics processing unit (GPU),application specific integrated circuit (ASIC), field programmable gatearray (FPGA), etc.), memory (including volatile memory, e.g., dynamicrandom access memory (DRAM), and/or non-volatile memory, e. g., harddisk, flash memory, magnetoresistive RAM (MRAM), resistive RAM (ReRAM)such as 3D XPoint™, etc.), network interface, and other hardware.

For ease of illustration, FIG. 1 shows one block for server system 102that includes two server devices 104 and 106. Server blocks 102, 104,and 106 may represent multiple systems, server devices, and othernetwork devices, and the blocks can be provided in differentconfigurations than shown. For example, server system 102 can representmultiple server devices that can communicate with other server systemsvia the network 150. In some implementations, server system 102 caninclude cloud hosting servers, for example. In some examples, storagedevices 160-164 and/or storage device(s) 170 can be provided in serversystem block(s) that are separate from server device 104 and cancommunicate with server device 104 and other server systems via network150. In some implementations, network environment 100 may not have allof the components shown and/or may have other elements including othertypes of elements instead of, or in addition to, those described herein.

Server devices may also be referred to as compute devices. For example,a server device or a compute device may include general purposeprocessing hardware (e.g., CPU, GPU, FPGA, etc.) and/or special purposeprocessing hardware (e.g., ASIC, accelerators, etc.) that is configuredto perform data processing tasks.

Network-attached storage devices 160, 162, and 164, and direct-attachedstorage device 170 may be any type of storage devices, e.g., thatprovide long-term and/or short-term data storage. For example, storagedevices 160-164 may include volatile memory (e.g., DRAM, static RAM(SRAM), etc.) and/or non-volatile memory (e.g., non-volatile RAM(NVRAM), MRAM, flash memory, hard disk drives, phase change memory, 3DXpoint™, resistive RAM, etc. In some implementations, e.g., in theexample illustrated in FIG. 1, storage devices 160-164 are coupled toserver system 102 via network 150, e.g., as a storage area network(SAN), as network attached storage (NAS), etc.

In some implementations, e.g., in the example illustrated in FIG. 1,storage device(s) 170 may be coupled to server device 104 via directattached storage protocols, e.g., non-volatile memory express (NVME),serial attached SCSI (SAS), etc. In some implementations, a storagedevice can be coupled to one, two, or more server devices (e.g., serverdevice 104 and server device 106) using non-volatile memory express overfabric (NVMEoF) protocol. Storage device(s) 170 can include a pluralityof storage devices, e.g., solid-state disks, hard drives, etc. In someimplementations, a storage device of storage device(s) 170 may becoupled to one of server device 104 or server device 106. In someimplementations, a storage device of storage device(s) 170 may becoupled to both server devices. In some implementations, both directattached and/or network-attached storage devices may be used. In someimplementations, storage devices may be directly coupled to or be a partof server system 102, e.g., coupled to one or more of server devices 104and 106 via a direct connection (e.g., via peripheral componentinterconnect (PCI) bus, universal serial bus (USB), etc.). In someimplementations, storage devices may include any number of storagedevices directly coupled to server system 102, and one or more devicescoupled to server system 102 via network 150.

In some implementations, storage devices 160-164 and/or storage device170 may be solid-state storage devices, e.g., that utilize flash memoryor other solid-state data storage technology. In some implementations, astorage device may include a plurality of channels. Each channel may beconfigured with a plurality of storage chips that can store blocks ofdata, organized into pages. In some implementations, the plurality ofchannels may be configured such that only a subset of chips (e.g., asingle chip) within a channel can be accessed at a particular instantand other chips are not accessible at the particular instant, e.g., in aserial access configuration. Further, in these implementations, theplurality of channels may be configured to enable concurrent access,e.g., any number of channels (e.g., a subset of the channels, allchannels, etc.) may be accessed at any particular instant, e.g., in aparallel access configuration. In some implementations, a storage devicemay include a storage controller (e.g., a special purposemicroprocessor) that facilitates access to the storage device.

In some implementations, network device(s) 180 may be coupled to serversystem 102 via network 150. Network device(s) 180 may include any typeof device that can send data to and receive data from server system 102.For example, network device(s) 180 may include e.g., network managementdevices, e.g., switches, routers; other servers or server systems; etc.

Network 150 may be any type of network that enables various systems toexchange data. Network 150 can be any type of communication network,including one or more of the Internet, local area networks (LAN),wireless networks (e.g., 802.11 networks, Bluetooth®, etc.), switch orhub connections, etc. In some implementations, network 130 can includepeer-to-peer communication between devices, e.g., using peer-to-peerwireless protocols (e.g., Bluetooth®, Wi-Fi Direct®, etc.), etc. In someimplementations, network 150 may include a wired network, e.g., agigabit ethernet network and/or a wireless network, e.g., an 802.11network, a Zigbee® network, etc.

In the example illustrated in FIG. 1, server device 104 is illustratedas providing a first plurality of application execution environment 110a-110 n (referred to individually as 110 a, 110 b, . . . , 110 n, andcollectively as 110), and server device 106 is illustrated as providinga second plurality of application execution environments 112 a-112 n(referred to individually as 112 a, 112 b, . . . ,112 n, andcollectively as 112). A server device may provide any number ofapplication execution environments, e.g., one application executionenvironment, or two or more application execution environments. Forexample, the number of application execution environments provided by aserver device may be based on a number and type of software applicationsto be executed within the application execution environments on theserver device, hardware configuration of the server device, connectivityof the server device to other devices, network bandwidth available tothe server device, etc.

An application execution environment as described herein can be anysoftware environment that supports execution of a software application.For example, an application execution environment may be an operatingsystem (e.g., Linux, Windows, Unix, etc.), a hypervisor that supportsexecution of one or more virtual machines (e.g., Xen®, Oracle VM Server,Microsoft Hyper-V™, VMWare® Workstation, VirtualBox®, etc.), a virtualcomputer defined by a specification, e.g., a Java Virtual Machine (JVM),an application execution container (e.g., containers based off LinuxCGroups, Docker, CoreOS, or the like), a process executing under anoperating system (e.g., a UNIX process), etc. In some implementations,the application execution environment may be a software application,e.g., that is configured to execute on server hardware.

Each application execution environment may be configured to supportexecution of any number of software applications. For example,application execution environment 110 a is illustrated as having aplurality of applications (120, 130, 132, and 134) executing within theapplication execution environment. Each of the plurality of applicationsmay have a respective portion of the memory of server device 104allocated to it, e.g., app memory 180-186, as illustrated in FIG. 1.

In some implementations, a portion of the memory allocated to anapplication may be shared between the application and the applicationexecution environment 110 a. In these implementations, both theapplication and the application execution environment are configured toaccess the memory, e.g., to read or write data. These implementationsmay provide a benefit that data accessed from a storage device can bewritten directly into application memory, without having to perform acontext switch between the application and application executionenvironment. Further, applications may be able to access storagehardware directly, without the context switch. In some implementations,the application memory is reserved for use by the application and is notshared with the application execution environment

As illustrated in FIG. 1, application 120 includes a storage driver(122) that stores data regarding storage container(s) 124 allocated tothe application, per techniques of this disclosure. Storage container(s)124 may be one, two, or more storage containers. In this example,storage driver 122 is part of application 120 itself and is not providedseparately within the application execution environment. Storage driver122 is configured to provide application 120 access to storage devicescoupled to server device 104.

Further, other applications (130, 132, 134) are illustrated as accessinga storage driver (140) provided within the application executionenvironment 110 a. Storage driver 140 may be a software application thatis configured to provide other applications within an applicationexecution environment access to one or more storage devices coupled toserver device 104, e.g., storage device(s) 170 coupled to server device104 as direct-attached storage devices and/or any of storage devices160-164.

In some implementations, storage drivers for various applications, e.g.,storage driver 122 included in application 120 and storage driver 140that provides storage for applications 130, 132, and 134, maycommunicate with each other. In some implementations, the communicationbetween the storage drivers may be in a peer-to-peer manner, e.g., asillustrated in FIG. 1 by peer-to-peer connection 152. In someimplementations, e.g., when three or more storage drivers communicatewith each other, such communication may be performed using a meshconnection between the storage drivers (e.g., a software-defined mesh).

For example, storage driver 122 and storage driver 140 may send controlplane messages to each other, e.g., to arbitrate access to storagedevices. For example, if three applications issue storage accesscommands, each storage command may correspond to one or more storagedevices that are part of a storage container allocated for eachrespective application. In a mesh configuration, where a respectivestorage driver for each application communicates with storage driversfor other applications, control plane messages may be used by eachstorage driver to avoid conflict in accessing the physical storagedevice. Similar communications may be handled in a peer-to-peer mannerbetween storage drivers of any pair of applications. In both the aboveexamples, the storage driver communicates directly with the storagedevice, while using communication with other storage drivers for controlplane signaling messages.

In some implementations, a centralized master, e.g., implemented in anyof the storage drivers (e.g., storage driver 122 or storage driver 140)or as part of an operating system (e.g., part of boot-up configuration)of a server device that provides the applications, may be configuredsuch that it is responsible for storage container configuration. In thisexample, the centralized master may receive control plane messages, andprovide instructions to each storage driver to access a storage devicein a manner that eliminates conflict between different storage drivers.

In some implementations, storage driver 122 may store informationregarding storage (e.g., non-volatile storage) configured for use byapplication 120. In the example illustrated in FIG. 1, storage driver122 stores information for a storage container 124 configured forapplication 120. Similarly, storage driver 140 may store informationregarding storage configured for access by each respective application,e.g., storage container(s) 142 corresponding to application 130, storagecontainer(s) 144 corresponding to application 132, and storagecontainer(s) 146 corresponding to application 134.

In some implementations, information for storage containers 124 and/orstorage containers 142, 144, and 146, may include identificationinformation of one or more storage devices (e.g., storage devices 160,162, and 164) that store data for a corresponding application. Forexample, data for application 130 may be stored in a plurality ofstorage devices, and information regarding individual storage units(e.g., memory cells, pages, blocks, chips, etc.) that stored data forapplication 130 may be accessible from storage container 142. As usedherein, storage container refers to a software-defined aggregation ofstorage units that may be part of an individual storage device (e.g., anSSD drive) or may be spread across multiple storage devices.

FIG. 2 is a flow diagram illustrating one example of a method 200 toaccess a device to fulfill an input-output (IO) request, according tosome implementations. In some implementations, method 200 is performedin response to detecting that an application (e.g., a softwareapplication) has launched, upon receiving a first IO request from anapplication, a new IO request from a previously launched or suspendedapplication, etc. The method 200 may be implemented by a device accessmodule included within the software application, e.g., by incorporatinga software library that includes code for the device access moduleand/or as a separate device access module, as explained with referenceto FIGS. 3A and 3B.

In block 202 of method 200, one or more IO request(s) are received froman application, by the device access module. For example, the IOrequests may be a request to write data to storage or to read data fromstorage. The storage may be a storage device included in a computingdevice that implements method 200, a direct-attached storage devicecoupled to a computing device that implements method 200, or anetwork-based storage device that is accessible by a computing devicethat implements method 200. In another example, the IO requests may beto access a network device, e.g., another computing device such asanother server or server system, a network appliance, a networkedstorage device, etc. In some implementations, the IO requests mayinclude storage access requests, requests to access a network device orboth.

In some implementations, an IO request may include a single IOoperation. For example, the single IO operation may be an operation towrite new data to a storage device (C), read data from a storage device(R), update data on a storage device (U), or delete data from a storagedevice (D). Similar operations are also possible for a network device,e.g., a server or computing device, a network appliance, etc. Forsimplicity, the rest of this document refers to the IO operations asCRUD operations. A CRUD operation may be understood as any of create,read, update, or delete operation, performed by accessing a storagedevice and/or a network device. In some implementations, an IO requestmay include multiple IO operations. In some implementations, thedifferent IO operations may be performed by accessing the same storageor network device, or by accessing respective devices for each of thedifferent IO operations. In some implementations, an IO request mayspecify one or more IO parameter values that are to be used to fulfillthe IO request. IO parameters are discussed below with reference toblock 208. Block 202 may be followed by block 204.

In block 204, it is determined by the device access module whether aprofile is known for the application. The application profile may bebased on an application type and may include one or more parametersdetermined for the application. For example, the application type maybe, e.g., an online transaction processing (OLTP) application thatutilizes a traditional relational database, e.g., that supports thestructured query language (SQL) and provides atomicity, consistency,isolation, durability (ACID) guarantees; an OLTP application thatutilizes a NoSQL database, a key/value store, etc.; a backup or virtualdesktop infrastructure (VDI) that utilizes binary large object (BLOB)storage, etc. Other types of applications and sub-types within thesetypes are possible. The application profile may also be based on anetwork port associated with the application, a programming languageused for the application code (e.g., an interpreted language such asJavaScript, Python, etc., a compiled language such as C++, etc.), anapplication execution environment for the application (e.g., JavaVirtual Machine, Linux or other operating system, an execution containerthat specifies a particular combination of one or more of hypervisor,operating system, database, and other software components, etc.), or theapplication program code (e.g., source code such as JavaScript, Python,etc.; compiled executables; or intermediate representations, e.g.,bytecode), etc.

If it is determined in block 204 that the application profile is known,block 204 may be followed by block 206. If it is determined in block 204that the application profile is not known, block 204 may be followed byblock 220.

In block 220, the application profile may be obtained by the deviceaccess module. For example, an application identifier for theapplication may be determined, e.g., based on a process name, a name ofa running executable file of the application, the IO requests generatedby the application, etc. Based on the application identifier, anapplication type may be determined, and a corresponding applicationprofile may be selected from available profiles. Alternatively, theapplication type may be specified in a configuration setting. In someimplementations, e.g., when the application identifier cannot bedetermined or does not correspond to a known application type, a defaultprofile may be used.

In some implementations, the application profile may be determined bythe device access module. based on prior IO requests from theapplication. For example, prior IO requests from the application (e.g.,within a prior period of time such as one minute, one hour, one day, orother periods of time) may be analyzed to determine the applicationprofile. In this example, prior IO requests that have a time of arrivalwithin N units of time of a current IO request may be considered todetermine the application profile. In some implementations, a certainnumber (e.g., ten thousand, one million, etc.) of prior IO requests maybe utilized to determine the application profile.

For example, the prior IO requests may be analyzed to determine arespective proportion of different types of IO operations in the priorIO requests, e.g., a proportion of each of create, read, update, anddelete (CRUD) operations. For example, a pattern of IO requests may bedetermined for the application, e.g., 10% C, 10% D, 40% U, 40% R; 70% C,20% D, 0% U, 10% R, etc. The pattern of IO requests may be included inthe application profile and may be utilized to determine one or moreparameter values, e.g., as described with reference to block 208.

In some implementations, the prior IO requests may be analyzed todetermine a proportion of the IO requests that result in a cacheinvalidation or a cache miss, e.g., when the method is implemented by adevice access module that utilizes caching to service IO requests. Forexample, the proportion of IO requests may be included in theapplication profile and may be utilized to determine a value of one ormore parameters, e.g., the cache type parameter, the read-bufferparameter, and the write-buffer parameter, e.g., as described withreference to block 208. In some implementations, a rate of IO requestarrival may be determined based on the prior IO requests. For example,the rate of request arrival may be determined based on an averageduration of time (or a median duration of time, or other statisticalvalue) between consecutive IO requests in the prior IO requests. Therate of request arrival may be stored in the application profile, andutilized to set parameter values, as described with reference to block208.

In some implementations, the prior IO requests may be grouped based on atime of arrival of each request. For example, the grouping may be basedon a sampling period, e.g., 30 seconds, one minute, five minutes, etc.The groups may be associated with a respective time period, and bereferred to based on time, e.g., if a current time is t, the most recentgroup may be numbered 1, the next-most-recent group may be numbered 2,and so on. Grouping the prior IO requests in this manner may provide abenefit by averaging variation in the characteristics and allowdetermining a value of a particular characteristic of the prior IOrequests.

In some implementations, a respective weight may be assigned to eachrequest group. For example, recent groups may be assigned a higherweight, and older groups may be assigned lower weights. Such assignmentof weights ensures that recent IO requests (indicative of a recentpattern of access from the application) have a greater impact on theapplication profile. For example, the weights may decrease linearly, ornon-linearly based on a difference between the current time t and thesampling period during which requests in each request group werereceived. Assignment of weights in this manner ensures that a firstrequest group associated with a recent sampling period is assigned ahigher weight than a second request group associated with an earliersampling period.

Values of a particular characteristic of the prior IO requests may bedetermined based on the grouping, and the respective weights, ifassigned. For example, the particular characteristic may be astatistical value of a characteristic, e.g., average size in bits of theprior IO requests, a proportion of each of CRUD operations in the priorrequests, etc.

In some implementations, a size in bits of the prior IO requests (e.g.,an average size, a maximum or minimum size, a median size, etc.) may beanalyzed. In some implementations, a frequency distribution of the sizein bits of the prior IO requests may be determined. In someimplementations, the size in bits and/or the frequency distribution maybe utilized to determine a bandwidth (e.g., storage access bandwidth,network access bandwidth, etc.) utilized by the application. Thedetermined bandwidth may be stored as a bandwidth requirement of theapplication in the application profile. For example, the bandwidthrequirement may be utilized in block 206 to identify a device to fulfillthe IO request, and/or in block 208 to set values of one or more of theIO parameters. For example, if the frequency distribution is the normaldistribution, the read-buffer and/or the write-buffer parameters may beset to a value that is within a range of the mean of the frequencydistribution, e.g., within three sigma of the mean. Block 204 may befollowed by block 206.

In block 206, a device is identified by the device access module tocomplete the IO request. In some examples, the IO request from theapplication may specify the device, e.g., by a storage deviceidentifier, by a network identifier, etc. For example, the storagedevice may be identified by a port (e.g., USB port) to which the storagedevice is connected, when the storage device is external to a computingdevice that implements the method. In another example, the storagedevice may be identified by a storage container identifier, where thestorage container corresponds to a collection of storage devices, e.g.,SSD devices that include a number of flash memory chips. In anotherexample, the storage device may be identified by a device name (e.g.,Unix volume, mount point, or other identifier). In some examples, e.g.,when the storage device may be identified by a network port and/orprotocol that is used to couple the storage device to the computingdevice that implements the method. For example, network-attached storagedevices may be identified in this manner.

In some implementations, the IO request may not specify the device. Inthese implementations, one or more devices, e.g., storage hardware,network devices, etc. may be identified that can be accessed to fulfillthe IO request. For example, the device may be identified based on atype of operation specified in the IO request, e.g., create, read,update, or delete. In some implementations, the device may also beidentified based on the application type. Block 206 may be followed byblock 208.

In block 208, one or more IO parameters are selected to access thedevice identified in block 206. For example, an application templatethat specifies one or more IO parameters to access a device to fulfillIO requests generated by the application may be selected. For example,selecting the application template may include setting values of the IOparameter based on the application profile and/or identified device. Insome implementations, IO requests may be received from a plurality ofapplications, each with a corresponding application profile. In someimplementations, different application templates (with different valuesof IO parameters) may be used for different application types. Further,the IO requests may be received during different execution stages of anapplication. In some implementations, different application templatesmay be used during different execution stages of the same application.

In implementations where the application profile is not known, e.g.,when a default profile is selected in block 220, a default applicationtemplate that specifies default values of IO parameters may be utilized.In some implementations, an application template may be created based onanalyzing the IO requests from the application. For example, for anapplication that is initially configured to use the default template,values of one or more IO parameters in the default template may bemodified based the analysis of IO requests from the application togenerate an application-specific template. Further, the application typemay be determined at runtime, e.g., while fulfilling IO requests fromthe application, and a corresponding application template may beutilized.

The application profile may include, e.g., a type of IO operationsincluded in IO requests from the application. For example, anapplication that utilizes a traditional relational database managementsystem (RDBMS) may generate portable operating system interface(POSIX)-compliant IO operations, while another application that uses aNoSQL database may generate IO operations as simple key-value pairs. Inanother example, a backup or virtual desktop infrastructure (VDI)application may generate IO operations that access binary large object(BLOB) data in storage.

In some implementations, e.g., when the IO requests are requests toaccess a storage device, the application profile may include parameterssuch as a storage capacity requirement (e.g., “very large” for anapplication that stores videos or multimedia; “small” for an applicationthat stores key-value pairs, e.g., page visit counts for web pages;etc.), a storage bandwidth requirement (e.g., “high” for an applicationthat accesses a large amount of data in a short time interval, e.g., avideo application), a storage access type (e.g., “read-only” for datawarehousing applications, “append-only” for applications that generateand store logs, “read and write” for database applications, etc.), and astorage block size (e.g., 1 MB for an application that uses anobject-API to write data to storage; 4 KB for an application that writeskey-value pairs that are of size 4 KB each, etc.).

In some implementations, such parameters in the application profile maybe used to set or update hardware settings on the storage device (e.g.,a page size of an SSD device, an error-detection/error-correction codeon a hard disk, etc.). In some implementations, other parameter values,such as a firmware capability of the storage device, an age (e.g., anaverage age, a median age, or other statistical value, or a frequencydistribution of age) of the storage device or individual storage units(e.g., flash memory chips) of the storage device, historical error ratesof storage devices accessed by the application, may also be included inthe application profile.

Cache Type Parameter

In accessing a device, e.g., a storage device, some applications maybenefit from the use of a cache, e.g., a storage cache. Based on theapplication type, a parameter may include a cache type parameter thatspecifies whether a cache is to be used to fulfill IO operations (CRUD)in the IO requests from an application, and a type of the cache to beused. In some implementations, the value of the cache type parameter maybe determined by the device access module based on the type ofoperations in the IO requests from the application.

For example, a cache type parameter may indicate that a write-back cacheis to be used for a particular application, e.g., such that writes toactual storage or network locations are deferred, e.g., until the timethe particular portion of the cache is to be updated by writing data toa device. For example, the cache type parameter may indicate that awrite-back cache be used for an application that utilizes a traditionalRDBMS.

For example, an RDBMS or an application that utilizes an RDBMS maygenerate IO requests that include any of the CRUD operations, e.g.,using a POSIX-compatible application programming interface (API). Suchapplications may benefit from the use of a write-back cache. Since suchapplications can perform in-place updates of data, the cache used forwrite operations is selected so as to ensure that values correspondingto keys that are previously stored are updated per the most recent IOrequest to write to the key. Since the application requires a guaranteethat the most recent value is written, write-back is selected as thevalue of the cache type parameter for the application.

In another example, the cache type parameter may indicate that awrite-direct cache is to be used for a particular application. Forexample, a write-direct cache may be used for an application thatutilizes a NoSQL database. For example, if the NoSQL database isutilized such that the IO requests include only create or readoperations (e.g., as is the case when using a key-value API with anappend-only option for data), the application may benefit from the useof a write-direct cache. Since the application writes data only ascreate operations, e.g., a new value corresponding to a previouslystored key is appended in the data storage device, and automaticallyinvalidates the previously stored value, the cache can be implemented asa write-direct cache where key-value pairs from the IO request arecached and written to the device when the device is available. Since theapplication does not require in-place updates, write-direct is selectedas the value of the cache parameter for the application. In someimplementations, the cache parameter can include one or moresub-parameters, e.g., maximum size of the cache; a rate at which datafrom the cache is flushed to a storage device; cache eviction policy,e.g., least-recently used (LRU), most-recently used (MRU), oldest first,etc.; a unit of cache size (e.g., 4 K, 64 K, 128 K, etc.)

In some implementations, an application may never generate any updateoperations, and instead, only performs append operations (A). In theseimplementations, the cache may be bypassed to perform a write directlyto the device (e.g., storage device) via zero copy. In theseimplementations, contents of the cache do not become invalid after thewrite because the application does not use update operations.

In some implementations, update (U) operations from an application maybe implemented by using a read-before-write paradigm. In these examples,a current value of data may be read from the device, prior to writing anew value of the data. If the read-before-write paradigm is utilized,the cache type parameter is selected such that a write-back cache isused. If a write-back cache is used, zero-copy (e.g., direct copy from adevice to memory), explained with reference to FIGS. 4 and 5 below, isnot utilized. If the read-before-write paradigm is not utilized, e.g.,when updates overwrite previously stored data values, a write-ahead (orwrite-direct) cache may be utilized. If a write-ahead cache is used,zero copy can be utilized.

In another example, for other applications, e.g., that utilize BLOBstorage, the parameter may specify that no cache is to be used. Forexample, an application that specifies IO requests that utilize anobject API, may not benefit from the use of a cache. Since the IOrequests from the application are only of type C, R, or D, and do notinclude updates, “no-cache” may be selected as the value of the cachetype parameter for the application. Other types of applications mayutilize other types of caches.

For applications for which the cache type parameter has a value thatindicates that a cache is to be implemented, a portion of the physicalmemory of the computing device that implements the method may beallocated to support transient in-memory storage of data correspondingto IO requests generated by the application. For example, the portion ofthe physical memory may be allocated as buffers where data is copiedduring fulfillment of an IO request. For example, data generated by theapplication that is to be written to a device, e.g., by create or updateoperations, may be stored in the buffers. Similarly, for readoperations, data retrieved from a device may be stored in the buffers.

Read-Buffer and Write-Buffer Parameters

One or more further parameter may specify a size of the buffers for anapplication. The size of the buffers may be based on the applicationtype. A read-buffer parameter may be used to specify a size of a readbuffer. For example, small read buffers, e.g., of a size between 4 KB to64 KB, may be used for NoSQL applications, medium read buffers, e.g., ofa size between 64 KB to 256 KB may be used for online transactionprocessing (OLTP) applications that use an RDBMS, and large readbuffers, e.g., between 256 KB to 1 MB, or larger, may be utilized forother applications, e.g., virtual desktop infrastructure (VDI) or backupapplications.

A write buffer parameter may be used by the device access module tospecify a size of a write buffer. For example, medium write buffers,e.g., between 64 K to 256 K, may be utilized for NoSQL or RDBMSapplications, while large write buffers, e.g., 256 KB to 1 MB, orlarger, may be used for other applications.

Queue Parameter

In some implementations, queues may be implemented by the device accessmodule for the operations in the IO requests. For example, in someimplementations, separate queues may be implemented for C, R, U, and Doperations. Queues may be implemented in the physical memory of acomputing device that implements the method. Queues may hold metadata,e.g., pointers to buffers that are configured to store data for theapplication. In some implementations, queues may be implemented in alock-free manner (e.g., when the cache type is write-direct). Forexample, applications that utilize a NoSQL database or other key-valuebased accesses, or access data as objects (e.g., binary largeobjects—BLOB) to a device, queues may be implemented in the lock-freemanner, e.g., such that multiple concurrent IO operations that accessthe same data, e.g., same key-value pair, are permitted. In theseimplementations, the queue parameter may be set to a value “lock-free.”

In some implementations, queues are implemented utilizing locks, suchthat only one IO operation at a time can access particular data. Forexample, queues utilizing locks may be implemented for OLTP applicationsthat utilize an RDBMS. In these implementations, the queue parameter maybe set to a value “locked.” In some implementations, such applicationscan alternatively be implemented with lock-free queues (with queueparameter set to “lock-free”), e.g., if the cache is locked such thatconcurrent IO operations to same key-value pair (or other data) aredenied.

In some implementations, the queue parameter may specify whetherseparate queues are to be utilized for different operation types (CRUD),in addition to or alternative to the “locked” or “lock-free” of theparameter. For example, the queue parameter may specify that separatequeues are to be used for different operations, e.g., individual queuesfor each of C, R, U, and D operations. In this example, the queueparameter may specify “separate” or “4” to indicate that four differentqueues are to be implemented. Any number of queues may be implemented.For example, two queues may be implemented such that read operations arein a first queue, while create, update, and delete operations are in asecond queue.

Implementing queues and/or buffers may allow grouping multiple IOoperations (also referred to as batching) into a single device access orsplitting a single IO operation (also referred to as chunking) intomultiple device accesses.

For example, batching may be advantageous, e.g., when a size of data ina particular IO operation is smaller than a size of an individual unitof memory, e.g., when an IO operation specifies access to 30 bytes ofdata, while memory units are sized 1 KB. Some applications, e.g., thatutilize key-value or NoSQL databases, may generate a large number of IOrequests that are associated with small data values, e.g., readoperations that specify key values of 30 bytes. Batching may beadvantageous by combining multiple operations into a single deviceaccess, thus distributing the overhead of device access across themultiple operations.

Similarly, chunking may be advantageous, e.g., when a size of data in aparticular IO operation is larger than a size of an individual unit ofmemory, e.g., when an IO operation specifies access to 10 MB of data,while memory units are sized 64 KB. Some applications, e.g., thatutilize BLOB storage, may generate a large number of IO requests thatare associated with large data values, e.g., write operations thatspecify objects of 10 MB. Chunking may be advantageous by splitting thesingle write operation into multiple accesses to a storage device suchthat each write to the storage device corresponds to a smaller size ofdata, e.g., 64 KB.

Journaling Parameter

In some traditional implementations, journaling is used, e.g., whenstoring data to a storage device in response to an IO operation.Journaling may be implemented as a feature of a filesystem and mayallow, e.g., rollback of one or more IO operations, provide a sequentialrecord of IO operations, etc. In the described implementations, method200 may be implemented as part of a device access module that accesses adevice directly, e.g., by specifying physical storage addresses of astorage device. The device access module may selectively choose toimplement journaling, e.g., based on the application type.

For example, journaling is beneficial to some applications, e.g.,applications that require guarantees that data written to a storagedevice is retrievable. For such applications, the journaling parametermay be set to yes. In these implementations, create or update operationsmay be performed by the device access module such that modifications aremade to storage units in a storage device in a manner that can be rolledback, and that provides guarantees that the written data is retrievable.

Some applications, e.g., applications that write key-value pairs thatare frequently updated (e.g., Internet-of-Things applications that writenew values for keys at frequent intervals, e.g., sensor readings forinstantaneous temperature, etc.) may not substantially benefit from theoverhead of journaling. For such applications, the journaling parametermay be set to “no.” For applications that generate update operationsthat do not utilize the read-before-write parameter, the journalingparameter is set to “no.” For applications that generate updateoperations that utilize the read-before-write parameter, the journalingparameter is set to “yes.”

Mapping Parameter

In some implementations, an IO request from an application may specify aparticular address, e.g., a storage unit address, a network address,etc. from where to read data, or to which data is written. In someimplementations, the address may be a logical block address. Dependingon the type of application, the logical block address may be mapped to aphysical address, such as a particular page (or another storage unit) inan SSD device, by the device access module. Logical block address (LBA)to physical block address (PBA) mapping may be implemented with anin-memory implementation, e.g., a HashMap. In different implementations,the hashmap can be locked or lock-free. In some implementations, amapping parameter may be utilized to specify whether the mapping isimplemented in a locked or lock-free manner.

For some applications, e.g., applications that perform IO operationsusing a read-before-write paradigm, the mapping between logical andphysical addresses may be locked, e.g., the mapping parameter may be setto the value “locked.” Such implementations may ensure data integrity,e.g., by performing create or update operations in a manner that ensuresthat a single copy of data is accessed. A locked mapping may beutilized, e.g., by RDBMS or similar applications.

Some applications, e.g., applications that perform IO operations withoutusing the read-before-write paradigm, the mapping parameter may be setto the value “lock-free.” For example, applications that utilize NoSQLdatabase or object storage may be tolerant of multiple versions orcopies of data, since the application may have built-in features tocorrectly determine the correct version (e.g., based on a creationtimestamp). In another example, the mapping parameter may be set to“lock-free,” e.g., it is known (based on the application type, prior IOrequests, etc.) that the application does not generate an update (U)request.

Error Tolerance

Some applications may be highly tolerant of errors in the fulfillment ofIO requests. For example, applications that maintain multiple copies ofdata in storage, e.g., in a fault-tolerant or redundant fashion within asingle server, or across different servers, may be designed to tolerateIO errors. For example, IO requests from these applications, e.g., toread certain data, may be simultaneously sent to the multiple copies,and failure of any individual request may be tolerated since otherrequests may be successfully completed.

In another example, some applications, e.g., applications that cantolerate high latency, may also be tolerant of errors. For example, ifthe service guarantee of time of completion for an IO request issubstantially smaller than the high latency, simple retries of a failedIO request may suffice.

In another example, some applications may be tolerant of IO requests notbeing fulfilled, e.g., consecutive writes from an IoT sensor may haveinsignificant variation in values, and in some instances, theapplications may be designed in a fault tolerant manner, e.g., whereabsence of individual values does not lead to application-level errors.

In different implementations, the error tolerance ability of anapplication may be specified in a configuration setting, or may bedetermined based on one or more of: the application type; an identifierof the application; a type of device accessed by the application, e.g.,a storage device that provides guarantees); prior IO requests from theapplication; etc. Accordingly, the error tolerance parameter for anapplication may be set, e.g., to a “high,” “medium,” or “low” value. Theparameter can also be set as a numeric value that corresponds to thelevel of errors that an application can tolerate.

In implementations where the application does not specify a device tofulfill IO requests, the device may be selected by the device accessmodule based on the error tolerance parameter. For example, forapplications with “low” value of the error tolerance parameter, astorage device with built-in redundancy (e.g., in a redundant array ofindependent disks (RAID) configuration) and/or high reliability (e.g.,SSD with low wear levels, SSD with high quality flash memory units,storage with error correction capability) may be selected, e.g., by adevice access module that services the IO requests. For applicationswith “high” value of the error tolerance parameter, less reliablestorage devices may be selected to store data, e.g., SSD devices withhigh wear levels, non-redundant disk configurations, etc. Further, insome implementations, where applications are associated with “high”value of the error tolerance parameter, the number of retries performedwhen an IO operation fails may be restricted, e.g., no retries, 1 retry,or less than a threshold number of retries. Selection of storage devicesin this manner may save costs, e.g., by allowing cheaper storage to beutilized for applications that can tolerate high error rates. Selectionof storage devices may be based on other parameters, besides the errortolerance parameter, e.g., based on a performance specification (e.g.,in terms of response time) or other factors.

Access-Type Optimization Parameter

In some implementations, an access-type optimization parameter may beutilized by the device access module. For example, it may be determinedwhether the application that generates the IO requests uses aread-before-write paradigm. For example, if the application usesread-before-write, the access-type optimization parameter may be set toa value that specifies optimization order as update, delete, create,read, e.g., a value “UDCR.” In implementations that useread-before-write, a create operation that specifies writing a key-valueto the device succeeds only if the key is not present, and the operationfails if the key is already present.

In another example, if the application does not use read-before-write,the access-type optimization parameter may be set to a value thatspecifies optimization order as create, read, delete, update, e.g., avalue “CRDU.” In these implementations, a create operation thatspecifies writing a key-value to the device is performed if the key isnot present, and if the key is already present, an update operation isperformed to update the corresponding value.

The device access module (or other software) that implements the methodmay read the access-type optimization parameter and service the IOrequests from the application accordingly. For example, a priority ofservicing IO requests in the CRUD queues (if used) may be determinedbased on the access-type optimization parameter. The device accessmodule may service IO requests from the application out of order, e.g.,to prioritize one type of IO operation over other types, based on theaccess-type optimization parameter.

In another example, a total size of write-buffers or read-buffers may bebased on the access-type optimization parameter. In someimplementations, e.g., where the application does not specify a devicetype or device identifier of a device to fulfill an IO request, thedevice type (e.g., SSD storage, hard-disk storage, tape storage, etc.)and/or device identity (e.g., internal storage device, external storagedevice, network-attached storage device, etc.) may be selected based atleast in part on the access-type optimization parameter.

Storage-Container Parameter

In some implementations, a parameter that specifies whether anapplication uses storage containers and optionally, a type of storagecontainer may be utilized by the device access module. Storagecontainers may be similar to those described in the related U.S.provisional application Ser. No. 62/651,995 filed on Apr. 3, 2018.

For example, the storage-container parameter may specify a type of thestorage container, e.g., optimized for throughput, optimized for storagecapacity, etc. Further, the type of storage container or storage devicesthat are utilized for the storage container may be based on otherparameters, e.g., the error tolerance parameter.

Block 208 may be followed by block 210.

In block 210, resources may be allocated by the device access modulebased on the IO parameters. For example, such resources may includebuffers, caches, etc.

In some implementations, the size in bits of the prior IO requests asspecified in the application profile may be utilized to determine a sizeof the buffer to allocate for the application. In some implementations,the size of the buffer may be based on a respective proportion of eachtype of IO operation (create, read, update, delete) in the prior IOrequests, indicated in the application profile. For example, a largeread buffer may be allocated to applications where the applicationprofile indicates a relatively large proportion (e.g., 70%) of IOoperations in the prior IO requests were of the type R. In someimplementations, the allocated buffers may be partitioned, e.g., intosub-buffers, or implemented as separate buffers for different types ofIO operations, In these implementations, the size of buffer for eachtype of operation may be based on the proportion of that type ofoperation in the prior IO requests, as indicated in the applicationprofile. In some implementations that utilize a read buffer (e.g.,read-buffer parameter is set to “yes”) a size of the read buffer may bebased on a size (e.g., an average size, a total size within a requestgroup, a median size, etc.) of data read in read operations of the priorIO requests, as indicated in the application profile.

In some implementations, the IO request(s) received in block 202 mayinclude a plurality of IO operations. A respective size of the IOoperations may be determined, e.g., a size of data access to fulfilleach IO operation (e.g., “read 50 KB,” “write an object of size 1 MB,”“delete 16 bytes,” update a key-value pair, with a total size of 10 KB,”etc.). In some implementations, two or more operations of the pluralityof IO operations may be grouped (batching). For example, such groupingmay be performed such that a combined size of the data access for theoperations is less than or equal to a size of buffer (e.g., a readbuffer, a write buffer, etc.) allocated for the application. In someimplementations, it may be determined that a size of an IO operation(e.g., read operation) in the IO request is larger than a size of thecorresponding buffer (e.g., read buffer). In response to suchdetermination, the IO operation may be split into a plurality ofsub-operations such that a size of each sub-operation (e.g., a size ofdata read in the sub-operation) is less than or equal to the size of thebuffer.

In some implementations, buffers may be allocated in integer multiplesof size of an addressable unit of physical memory of the computingdevice that implements method 200. Grouping (batching) or splitting(chunking) IO operations as described above may improve IO performance,since the size of IO operations matches the size of allocated buffers,which may reduce or eliminate situations where a buffer insufficient foran operation, or a buffer is underutilized. The method continues toblock 212.

In block 212, one or more devices, e.g., storage devices, may beprovisioned and allocated to the application that provided the IOrequest by the device access module. For example, one or more storagedevices that were previously unallocated (or may be partially utilizedby another application), may be selected to fulfill the IO request.Block 212 may be followed by block 214.

In block 214, the IO request from the application is fulfilled by the bythe device access module by accessing the device. For example, data maybe written to a storage device based on a write operation in the IOrequest, data may be read from a network device based on a read request,etc. In some implementations, where the device accessed to fulfill theIO request is a storage device, one or more hardware characteristics ofthe storage device may be determined by the device access module, andused to access to the storage device. For example, the one or morecharacteristics may include a physical type of storage unit in thestorage device (e.g., a flash memory cell, a hard disk block, a DRAMcell, etc.), a block size configured for the storage device (e.g., 4 KB,16 KB, etc.), one or more configuration parameters of the storage device(e.g., serial access capable, parallel access capable, capable of fastreads and slow writes, etc.), or a size of the storage device (e.g., 4GB, 4 TB, 4 MB, etc.) may be determined. In some implementations, theapplication profile may also specify configuration parameters regardingaccessing a storage device, e.g., a number of retries, a number of bitsper cell for a flash memory device, etc. For example, the applicationdeveloper may set such parameters (which act as hints to the deviceaccess module) based on application functionality (e.g., whether theapplication is built to tolerate storage errors, whether the applicationhas built-in functionality to retry when a storage IO request fails,etc.). Block 214 may be followed by block 216.

In block 216, the IO requests may be analyzed. For example, analysis ofthe IO requests may be similar to that described above with reference toanalysis of prior IO requests to obtain the application profile, asdescribed with reference to block 220 above. In some implementations,the IO requests may be analyzed by the device access module. In someimplementations, the IO requests may be analyzed by an analytics moduleseparate from the device access module. Block 216 may be followed byblock 218.

In block 218, the application profile may be updated based on theanalysis of IO requests performed in block 216. For example, one or moreIO parameters in the application profile may be updated based on theanalysis. In various implementations, updating the application profilemay be performed by the device access module, by an analytics module, ora combination. Block 218 may be followed by block 202, where further IOrequests may be received from the application.

Method 200 provides several technical benefits. For example, byproviding application profiles that include IO parameters to access adevice, the method enables devices such as storage and/or networkdevices to be accessed in a manner that is application (or workload)specific and is optimized for the application. One or more settings ofthe storage device may also be set based on the IO parameters, such thatworkload-specific optimization is applied to the device hardware (e.g.,block size, error correction capability, etc. in an SSD device). Theparameters specified in the application profile are utilized, e.g., by adevice access module implemented as part of the application and/orseparate from the application but executing as a user space process.Different applications running on the same computing device can,therefore, be associated with different IO parameters and thus, eachapplication benefits from an IO configuration suited to the applicationcharacteristics.

Further, the IO parameters may be specified statically in theapplication profile, e.g., by the application developer, based on theapplication type, etc. and/or learned dynamically, e.g., by analyzingprior IO requests from the application. The IO parameters can be updatedat runtime, e.g., by analyzing IO performance when particular IOrequests are fulfilled and making adjustments to the IO parameters. IOparameters can help optimize IO for an application such as storageaccesses, e.g., by providing application-specific caches, IO operationspecific queues (e.g., separate queues for each of CRUD), buffers sizedto meet application-generate IO, use of zero copy technique, selectiveuse of journaling, logical-to-physical block address mapping, selectiveuse of data redundancy for applications that require error tolerance,prioritization of specific types of IO operations over other operations,use of storage containers, etc. Different combinations of suchparameters allow a computing device that implements method 200 to meetthe quality of service expectations from different types ofapplications, with minimal or no changes to hardware.

Further, the method can be implemented for applications in any executionenvironment, e.g., an application executing on an OS, an applicationexecuting in a virtualized environment, an application executing Javavirtual machine, etc. In some implementations, method 200 may providequality of service (QoS) guarantee for IO operations, e.g., by enablingIO requests to be fulfilled deterministically, e.g., within apredictable range of time from receipt of the request.

While method 200 has been described with reference to various blocks inFIG. 2, it may be understood that techniques described in thisdisclosure may be performed without performing some of the blocks ofFIG. 2. For example, in some implementations, e.g., where the IOrequests specify the device to be used, block 206 is not performed.

In some implementations, one or more of the parameters described withreference to block 208 above may be omitted, or other parameters may beused. In some implementations, one or more of the blocks illustrated inFIG. 2 may be combined. For example, blocks 210, 212, and 214 may becombined. In some implementations, the blocks may be performed indifferent order, e.g., block 208 may be performed before block 206.Other combinations of blocks are possible.

In another example, blocks 216 and 218 may not be performed, e.g., if acomputing device that implements method 200 does not have enoughcomputing capacity to analyze IO request data or to update applicationprofiles. In some implementation, blocks 216 and 218 may be performedoffline, and not in real-time e.g., separate from fulfilling IOrequests.

In some implementations, a sampling technique may be applied todetermine whether blocks 216 and 218 are to be performed. For example,upon fulfillment of IO requests, it may be determined whether the IOrequest fulfills a sampling criterion. Sampling criterion may specify,e.g., that every Nth (e.g., tenth, hundredth, thousandth, etc.) IOrequest be included in the sample; choose requests to include in thesample randomly with a sampling rate, e.g., 1%, 5%, 10% of all requests,etc. In some implementations, the sampling technique may be performedover a particular time period (e.g., 1 minute, 1 hour) and repeated ormodified in subsequent time periods.

In some implementations, the sampling rate (e.g., the proportion ofrequests in the sample to all fulfilled IO requests) may be determineddynamically. For example, it may be determined whether the performanceof a system that fulfills IO requests meets a performance threshold. Ifthe performance does not meet the threshold, the sampling rate may beincreased (and vice versa). In some implementations, multipleperformance thresholds may be utilized, each leading to a correspondingchange in the sampling rate.

In the implementations that utilize sampling techniques, blocks 216 and218 are performed if a particular request is included in the sample.Else, blocks 216 and 218 are not performed.

In some implementations, one or more of blocks 208, 210 and 212 may beperformed selectively. For example, one or more of blocks 208-212 may beperformed only if the application profile is updated in block 218. Insome implementations, particular individual blocks of blocks 208-212 areperformed based on whether there is an update to the applicationprofile, and on the type of update. For example, if there are no changesto buffers or caches based on the updated application profile, block 210may not be performed. If no additional devices are to be provisionedbased on block 208 and block 210, block 212 may not be performed.

The method 200 described with reference to FIG. 2 may be usable toaccess a device to fulfill input/output (IO) requests. In someimplementations, the device accessed may be a storage device, e.g., avolatile memory (e.g., DRAM, SRAM, etc.) and/or non-volatile memory(e.g., NVRAM, MRAM, flash memory, hard disk drive, phase change memory,3D Xpoint™, resistive RAM, etc.). In some implementations, the devicemay be a storage device, e.g., a hardware storage device accessiblephysically coupled to a computing device that implements the driver, ahardware storage device accessible via a network to which a computingdevice that implements the driver is coupled, both physically-coupledand network-based storage devices, etc.

In some implementations, the device may be a network or compute device,e.g., a network-attached storage device, a server or other computingdevice accessible at a particular network address, etc.

In some implementations, the method 200 described with reference to FIG.2 may be implemented as part of a device driver or device access module.Driver, device driver, or device access module, as used herein, refersto software code that is operable to access a device. In someimplementations, such code for the device access module may be providedas part of an application, as a standalone executable, as part of anapplication execution environment or operating system, etc. In someimplementations, the software code for the device access module may beprovided as part of other software, e.g., storage software,network-access software, hypervisor, other application software, etc. Insome implementations, the driver may be implemented in user space, e.g.,distinct from the operating system kernel that executes in kernel space.

Implementing the device access module in user space, e.g., as part of anapplication and/or as a separate executable may provide certainadvantages, e.g., it may allow for zero copy technique to be utilized.Further, such implementations may have easy upgradability, e.g.,compared to implementations where the device access module isimplemented as part of the operating system. User space implementationcan also make IO more efficient by reducing context switches betweenuser space and kernel space when performing IO operations. Further, thedevice access module can provide enhanced security, e.g., when themodule is implemented as part of the application itself.

In some implementations, the device access module may be implemented asa static or pre-compiled driver, e.g., with a fixed set of IO parametersand application profiles. In these implementations, the module maysupport one or more predetermined configurations, and may not adaptduring runtime to requests (e.g., IO requests) from different types ofapplications. The one or more predetermined configurations may includeconfigurations for specific types of applications, e.g., applicationsthat use NoSQL databases such as applications that access or processdata from Internet-of-Things devices, sensor data, webpage click data,online advertising data, and the like; OLTP applications that use arelational database; applications that utilize object storage, such asimage or video applications, etc. The IO parameters in the applicationprofile may be predetermined based on an identity of the applicationand/or the configuration of a computing device that services the IOrequests.

In some implementations, the device access module may be reconfigureddynamically, e.g., during execution or periodically, e.g., by utilizingthe techniques to update the application profile, as described withreference to blocks 216, 218, and 220. In these implementations,implementing the module as a runtime or dynamic driver may offer severalbenefits over static, pre-compiled drivers. For example, one or more IOaccess parameters used to access a device may be updated during driverexecution, e.g., based on processed IO requests, based oncharacteristics of one or more devices (e.g., service levels guaranteedby storage hardware) that are accessed to fulfill requests, behavior ofother applications that execute on the same computing device, e.g., in amulti-tenant configuration, etc.

FIG. 3A illustrates a block diagram of an example computing device 300which may be used for one or more implementations described herein. Thecomputing device 300 may be a server system 102, a server device 104 or106, etc. The computing device 300 may include a processor 331, one ormore storage devices 333, peripheral input-output interface(s) 335, aphysical memory 337, and a network interface 339. The components of thecomputing device 300 may be communicatively coupled by a bus 320.

Processor 331 includes an arithmetic logic unit, a microprocessor, ageneral purpose controller, or another processor array to performcomputations and to perform input-output (IO) operations. Processor 331processes data and may include various computing architectures includinga complex instruction set computer (CISC) architecture, a reducedinstruction set computer (RISC) architecture, or an architectureimplementing a combination of instruction sets. Although FIG. 3 includesa single processor 331, multiple processors 331 may be included. Otherprocessors, sensors, displays, and physical configurations may be partof the computing device 300. Processor 331 is coupled to the bus 320 forcommunication with the other components via signal line 322.

Storage device(s) 333 may be a non-transitory computer-readable storagemedium that stores data. Storage device(s) 333 may be a DRAM device, anSRAM device, an MRAM device, hard disk, flash memory, a ReRAM devicesuch as 3D XPoint™, or some other memory device. In someimplementations, the storage device 333 can include a compact disk readonly memory (CD ROM) device, a digital versatile disk ROM (DVD ROM)device, a DVD RAM device, a DVD re-writable (RW) device, a tape drive,or some other mass storage device. Storage device(s) 333 are coupled tobus 320 for communication with the other components via signal line 326.

In some implementations, peripheral IO interface(s) 335 may also beincluded in device 300. For example, peripheral IO interface(s) 335 mayinclude a universal serial bus (USB), secure digital (SD), category 5cable (CAT-5), or similar port for wired communication with adirect-attached device(s) 314 that are physically coupled to computingdevice 300. Peripheral IO interface(s) 335 are coupled to bus 320 forcommunication with the other components via signal line 328.

Physical memory 337 stores instructions that may be executed by theprocessor 331 and/or data. The instructions may include code forperforming the techniques described herein. The memory 337 may be adynamic random access memory (DRAM) device, a static RAM, or some othermemory device. In some implementations, the memory 337 also includes anon-volatile memory, such as an (SRAM) device or flash memory, or someother mass storage device for storing information on a more permanentbasis. Memory 337 includes code and routines operable to execute theapplications 342 and 352, as well as device access modules 344 and 353,which are described in greater detail below. The memory 337 is coupledto the bus 320 for communication with the other components via signalline 324.

During use of computing device 300, physical memory 337 may bepartitioned into user-space memory 306 and kernel space memory 308. Userspace memory 306 may store various applications, e.g., application 342,application 352, etc. In some implementations, an application mayinclude a device access module fully or partially. For example,application 342 includes a device access module 344, e.g., incorporatedas a code library. In these implementations, input-output (IO) requestsfrom the application may be processed by device access module 344.

In some implementations, application 352 partially includes deviceaccess module 354. In these implementations, portions of executable codeof device access module 354 are implemented separately from theapplication 352. For example, a portion of device access module 354 maybe incorporated as a code library in application 352, while otherportions are implemented separately from application 352. In theseimplementations, input-output (IO) requests from the application may beprocessed by device access module 354.

In some implementations, a device access module may be implemented as astandalone application (not shown). In these implementations,application code excludes device access code, and input-output requestsfrom the application may be sent to the standalone device access module.Device access module(s) 344 and 354 may include software code thatimplements method 200 to access a device to fulfill an IO request.

In some implementations, kernel space memory 308 may be accessible by anoperating system of the computing device 300 and may be restricted fromaccess by software applications (e.g., application 342, application 352,device access modules 344 and 354, etc.)

Network interface 339 transmits and receives data to and from a network310. Network 310 may couple device 300 with network device(s) 312. Insome implementations, network interface 339 includes a wired (e.g.,Ethernet, Gigabit Ethernet), wireless, or optical interface to network310 (e.g., via a network switch, router, hub, etc.). In someimplementations, network interface 339 includes a wireless transceiverfor exchanging data using one or more wireless communication methods,including IEEE 802.11, IEEE 802.16, Bluetooth® or another suitablewireless communication method. In some implementations, the networkinterface 339 includes a cellular communications transceiver for sendingand receiving data over a cellular communications network. In someimplementations, network interface 339 includes a wired port and awireless transceiver. Network interface 339 is coupled to the bus 320for communication with the other components via signal line 330.

FIG. 3B illustrates a block diagram of the example computing device 300(certain elements shown in FIG. 3A are omitted for clarity). Asillustrated in FIG. 3B, applications 342 and 352 may execute on thecomputing device 300, within an execution environment 340 that is storedin a user-space partition 306 of physical memory 337. Application 342may include software code that implements device access module 344.Device access module 344 may include a device access module cache 346that includes a request queue 347 and/or a response queue 348, based onIO parameters in the application profile for application 342.Application 352 may include software code that implements device accessmodule 354. Device access module 354 may include a device access modulecache 356 that includes a request queue 357 and/or a response queue 358,based on IO parameters in the application profile for application 342.While two applications are illustrated in FIG. 3B, any number ofapplications may execute within execution environment 340. For example,in a single tenant configuration, only one application may execute inthe execution environment 340, while in a multi-tenant configuration,two, three, or any number of applications may execute in the executionenvironment 340.

While FIGS. 3A and 3B show two applications 342 and 352 that eachinclude a respective device access module, it may be possible toimplement the device access module separate from the application, e.g.,as a standalone module 364, executing as a user space application.Further, in some implementations, device access modules 344 and 354(within applications 342 and 352, e.g., included using a code library)may be implemented together with standalone device access module 364.

Device access module 364 may include a request queue 367 and/or aresponse queue 368. Blocks of physical memory 337 may be allocated toimplement each of the queues 347, 348, 357, 358, 367, and 368). In someimplementations, memory blocks that are used to implement a queue may becontiguous, e.g., any of queues 347, 348, 357, 358, 367, and 368. Thememory blocks used to implement a queue may be accessible by theapplication that implements the device access module and the standalonedevice access module 364. For example, request queue 347 may be modifiedby device access module 344 and device access module 364, but not bydevice access module 354 that is part of application 352. Memory blocksthat are used to implement queues 367 and 368 may be restricted suchthat these can be accessed only by the module 364. The queues may beimplemented based on IO parameters specified in the application profile,e.g., the cache type parameter, the read-buffer parameter, thewrite-buffer parameter, etc. described above.

In implementations that include standalone device access module 364,module 364 may coordinate fulfillment of IO requests from variousapplications, e.g., applications 342 and 352. For example, module 364may communicate, e.g., via inter-process communication (IPC) messages384 and 386 respectively, with modules 344 and 354, and service IOrequests from respective applications 342 and 352. Module 364 may alsoinclude functionality to arbitrate between IO requests arriving fromdifferent applications, assign priorities based on request type,application identity, etc.

In some implementations, device access module 364 may be omitted. Inthese implementations, modules 344 and 354 may communicate directly,e.g., in a peer-to-peer manner, via inter-process communication messages382. In these implementations, two or more applications may communicatewith each other. In some implementations, the applications may share, inan asynchronous manner, a summary state of IO requests. For example,information exchanged between the applications may include control planemessages that allow hard arbitration. In some implementations, IPCmessages 382 may include information indicative of IO requests generatedby each application, e.g., a type of the request (CRUD), a priority ofthe request, a size of the IO requested, etc. Each of modules 344 and354 may be implemented to evaluate the IO requests, and may selectivelyback-off, e.g., delay their own IO requests, in the presence of higherpriority requests from other modules. In some implementations, IOrequests from different modules may be fulfilled using round-robintechniques, by implementing an oldest-request serviced firstprioritization of IO requests, or by other similar techniques. In someimplementations, a module that is starved of access to a device (e.g.,has a queue of unfulfilled IO requests larger than a threshold queuesize) may be enabled to send IPC messages 382 requesting other modulesto back-off.

In some implementations, IPC messages 382 may be exchanged betweenmodules 344 and 354 in a peer-to-peer manner, even when a centralstandalone module 364 is implemented. In these implementations, eachpeer module (modules 344 and 354) may implement a back-off technique toreduce or delays its IO requests in the presence of IO requests fromother applications. If a particular module is starved of device access(e.g., has a queue of unfulfilled IO requests larger than a thresholdqueue size), such a module may notify central module 364. Centralstandalone module 364 may be configured to send commands via IPCmessages 384 and 386 to other modules to back-off (e.g., delay or cancelIO requests) when notified by the particular module. In someimplementations, central standalone module 364 may perform an initialallocation of resources to peer modules, and subsequently, determinemetrics for each peer module, and adjust the allocation based on thedetermined metrics.

In various implementations, device access module software code may beprovided as a library that can be incorporated in any type ofapplication that executes on computing device 300, e.g., withinexecution environment 340.

FIG. 4 illustrates an example method 400 for data transfer between asoftware application and a storage device. In some implementations, themethod 400 may be implemented within a software application, e.g., byincorporating a software library that implements the method. In someimplementations, the method may be implemented as a separate softwaredriver that executes in the same application execution environment(e.g., a virtual machine) as a software application and that isaccessible by the software application via an application-programminginterface (API). In some implementations, a portion of the method may beimplemented in a software application, and another portion of the methodmay be implemented as a software driver.

In the various implementations described herein, the softwareapplication is allocated user space memory within the physical memory ofa computing device on which the software application executes. The userspace memory allocated to the software application is accessible byexecuting code of the software application, including the softwarelibrary that implements method 400. In the implementations in which themethod is implemented as a separate software driver, at least a portionof the user space memory allocated to the software application is sharedwith the software driver, in a shared memory configuration. In theseimplementations, the software driver is configured such that it can readdata from and/or write data to the portion of the user space memory thatis shared. Such sharing enables the software driver to perform storageoperations (e.g., read or write from a storage device) without makingintermediate copies of data, as described below.

At block 402, a data transfer request is received from a softwareapplication. For example, the request may be received from executingcode of the software application by the software library incorporatedwithin the software application, or by the software driver. For example,the data transfer request may be a request to access a storage device,e.g., a storage device that is part of a computing device that executesthe method 400 or is accessible by the computing device over a deviceinterface, such as a network interface. The data transfer request may bea request to read data from the storage device, or a request to writedata to the storage device.

In some implementations, e.g., when the data transfer request is towrite data to a storage device, the request may include the data to bewritten. In some implementations, the data may comprise one or more dataunits, each having a particular size, e.g., 1 KB, 1 MB, 10 MB, etc.

At block 404, a storage device is identified based on the data transferrequest. For example, the request may include an identifier of thestorage device, e.g., a hardware identifier, port to which the storagedevice is coupled, etc. For example, requests to read data may specifythe device that stores the data. In another example, the request may notinclude an identifier of the storage device. For example, a request towrite data may not specify a storage device to which the data iswritten. In another example, the data transfer request may specifydevice parameters (e.g., reliability, access speed, media type, etc.).In this example, a storage device is identified that has thoseparameters.

In some implementations, e.g., that utilize storage containers thatorganize hardware storage units into logical groupings, identifying thestorage device may include retrieving storage container definition,e.g., the logical organization of hardware storage units into storagecontainers. In these implementations, a request to read data may beanalyzed to determine a storage container specified in the data transferrequest, and the storage container definition is accessed to map from alogical address within the storage container to a physical address(e.g., a particular page or block of an SSD device) where the data isstored. In some implementations, a request to write data may be analyzedto identify a storage container that is suitably configured for the datato be written. For example, the request to write data may specifyparameters such as a time limit within which the data is to be written,a reliability requirement for the data, etc. Based on the parameters,the storage container definition is accessed to determine the storagecontainer and a hardware storage device within the storage container isidentified. The method continues to block 406.

At block 406, a command is sent to the identified storage device. Thecommand includes identification of hardware storage units (e.g.,physical address) within the storage device that are to be accessed tofulfill the data transfer request. For example, when the storage deviceis a solid-state storage device (SSD), the hardware storage units may bea memory cell (e.g., a flash memory cell), a page comprising a pluralityof memory cells, a storage block comprising a plurality of pages, a chipcomprising a plurality of storage blocks, etc. The SSD may include aplurality of chips, organized into one or more channels. Each hardwarestorage unit in the storage device may be associated with a respectiveaddress. For example, the address of a hardware storage unit may bespecified as a combination of SSD device name, channel, chip, block,page, etc.

In some implementations, sending the command may include sending data tothe storage device. For example, if the data transfer request is towrite data to the storage device, the command may include one or moreaddresses of hardware storage units within the storage device, andrespective data units to be written to the hardware storage units. Inthese implementations, the data to be written to the storage device isread directly from user space memory allocated to the application and issent to the storage device. In the implementations where a softwaredriver implements the method, the data transfer request may include apointer to a memory address within the user space memory allocated tothe software application from which the data is to be retrieved. Readingdata directly from user space memory reduces memory requirements, sinceno intermediate copies of the data are stored, and also reduces the timerequired for write operations since write operations are completedwithout having to make intermediate copies.

In conventional systems, sending the command and/or data to a storagedevice from an application requires a context switch to the operatingsystem, since the operating system is responsible for managing access tostorage devices, and stores information (e.g., device addresses,filesystem, file metadata, etc.) that is necessary to identify thestorage device and to generate the command. In some implementations ofthe present disclosure, sending the data to the storage device isperformed without a context switch from the software application to anoperating system. The present disclosure eliminates the context switch,e.g., since the hardware address that the data is to be written to orread from is known to the software application or software driver thatmanages data transfer to the storage device. Sending the command and/ordata without the context switch reduces the time required for the datato be sent and may enable the data transfer request to completed quickerthan in conventional systems that require a context switch. The methodcontinues to block 408.

In block 408, a response is received from the storage device. Forexample, the response may indicate success (e.g., data was writtensuccessfully to the storage device) or failure (e.g., a request to readdata from the storage device was unsuccessful). In some implementations,e.g., when the request is to read data from the storage device,receiving the response includes receiving the data that is read from thehardware storage units that were specified in the command. In theseimplementations, the method 400 may further include writing the receiveddata directly to user space memory that is accessible by the softwareapplication, e.g., user space memory that is allocated to theapplication, without making intermediate copies. In the implementationswhere a software driver performs the read operations, a portion of theuser space memory allocated to the software application may be sharedwith the software driver, such that the software driver can read datafrom and write data to the portion of the user space memory that isshared.

In conventional systems where an operating system manages access tohardware storage devices, data that is read from a hardware storagedevice may be first received in kernel space memory that is accessibleby the operating system, but not by a software application. In thesesystems, to make the data accessible to the application, the data needsto be copied to user space memory, and a context switch be performed tothe software application upon completion of the copying. Such transferof data to kernel space, and then copying it to user space memory, priorto the context switch can reduce throughput, e.g., since the applicationmay spend longer time waiting for the data to be read. Directlyaccessing the storage device without a context switch to the operatingsystem, and receiving the data and writing it to user space memory canimprove throughput, since the application can utilize the data as soonas it is written to the user space memory. The method continues to block410.

In block 410, the response is provided, e.g., by the software driver, orby the library that is part of the software application, to the softwareapplication (e.g., executing code) that sent the data transfer request.For example, if the request is a request to write data to storage, theresponse may indicate success or failure of the request. If the requestis a request to read data from storage, providing the response mayinclude providing a pointer to a memory address within the user spacememory where the data is written.

Method 400 has technical benefits over prior techniques to accessstorage devices to fulfill data transfer request. For example, inconventional techniques, writing data to the storage device may requiremultiple steps. For example, if the application is implemented within anexecution environment such as a virtual machine executing on top of anoperating system, e.g., that arbitrates hardware access requests frommultiple applications, IO requests from the application may be passed onto the operating system (OS). The OS may then implement a hardwaredriver, e.g., that accesses the storage device, executing in kernelspace. In this example, to write data to a storage device requires acontext switch from execution of the application code to execution of OScod to perform a write to the hardware storage device. Confirmation thatthe data was written to the storage device may be provided to the OSwhich in turn may indicate to the application that the IO request wascompleted. The multiple steps may be costly, e.g., in terms of hardwareresource utilization, time taken to complete an IO request, etc.

In contrast, some implementations of the techniques described herein mayallow access to a device, e.g., a storage device, directly from the userspace, e.g., by the software application or by the software driver. Suchaccess may be referred to as “zero copy.” In these implementations, readand write operations may be performed without a context switch to theoperating system, as explained above. a user space driver may be used byone or more applications to directly access the device via a zero copymechanism.

FIG. 5 illustrates a block diagram of an example environment 500 whichmay be used for one or more implementations described herein. Asillustrated in FIG. 5, a computing device 502 includes a processor 504and memory 506 coupled to the processor 504. A software application 508(e.g., executable code of the software application) is loaded in memory506, e.g., in an application execution environment, for execution byprocessor 504. Computing device 502 is coupled to a device interface 514which in turn is coupled to storage device(s) 516. In someimplementations, device interface 514 may be a peripheral interface(e.g., USB) or a network interface. In some implementations (not shown),storage device(s) 516 may be part of computing device 502, e.g., asinternal storage device(s) such as hard disk drives, SSD storage, etc.In these implementations.

In the example illustrated in FIG. 5, software application 508 includesa storage driver 510. For example, storage driver 510 may be executablecode that is part of the software application, or incorporated as alibrary. In some implementations, the storage driver may be implementedseparate from the software application, but within the applicationexecution environment. In these implementations, the softwareapplication may communicate with the storage driver 510 viainter-process communication (IPC) or by utilizing an applicationprogramming interface (API). Storage driver 510 enables softwareapplication 506 to access storage device(s) 516 by specifying a datatransfer request.

Memory 506 may include application data 512. For example, a portion ofmemory 506 may be allocated to software application 508, e.g., by anapplication execution environment such as a hypervisor, or by anoperating system. While FIG. 5 shows a single software application 508,it will be understood that any number of software applications may bestored in memory 506. Application data 512 may be accessible by softwareapplication 508 and storage driver 510, but not by other applicationsthat execute on computing device 502.

As explained with reference to FIG. 4, data may be transferred directlybetween application data 512 and storage device(s) 516 via direct memoryaccess 520. Direct memory access 520 refers to access of a storagedevice from software application 508 without a context switch to anoperating system of computing device 502, such that data fromapplication data 512 is written directly to storage device(s) 516, anddata read from storage device(s) 516 is directly written to applicationdata 512.

With the use of direct memory access (DMA), no intermediate copies ofdata are made such that a write operation results in data from thesoftware application, e.g., within user space memory allocated to thesoftware application, being written directly the storage device, and aread operation results in data being read from the storage devicedirectly into the user space memory allocated to the softwareapplication. By performing storage device access in this manner, themethods described herein enable fast access to storage devices with apredictable rate of access, thereby allowing software applications toperform data access within specific time periods. In someimplementations, the rate of access may be deterministic, e.g., wherethe time required for data access is proportional to the amount of dataaccess and known ahead of accessing the storage device.

One or more methods described herein (e.g., method 200 and/or method400) can be implemented by computer program instructions or code, whichcan be executed on a computer. For example, the code can be implementedby one or more digital processors (e.g., microprocessors or otherprocessing circuitry or hardware), and can be stored on a computerprogram product including a non-transitory computer-readable medium(e.g., storage medium), e.g., a magnetic, optical, electromagnetic, orsemiconductor storage medium, including semiconductor or solid statememory, magnetic tape, a removable computer diskette, a random accessmemory (RAM), a read-only memory (ROM), flash memory, a rigid magneticdisk, an optical disk, a solid-state memory drive, etc.

The program instructions can also be contained in, and provided as, anelectronic signal, for example in the form of software as a service(SaaS) delivered from a server (e.g., a distributed system and/or acloud computing system). Alternatively, one or more methods can beimplemented in hardware (logic gates, etc.), or in a combination ofhardware and software. Example hardware can be programmable processors(e.g. field-programmable gate array (FPGA), complex programmable logicdevice), general purpose processors, graphics processing units (orGPUs), application specific integrated circuits (ASICs), and the like.One or more methods can be performed as part of or component of anapplication running on the system, or as an application or softwarerunning in conjunction with other applications and operating system.

One or more methods described herein can be run in a standalone programthat can be run on any type of computing device, a program run in a webbrowser, a server application that executes on a single computer, adistributed application that executes on multiple computers, etc. In oneexample, a client/server architecture can be used, e.g., a mobilecomputing device (as a client device) sends user input data to a serverdevice and receives from the server the final output data for output(e.g., for display). In another example, computations can be splitbetween the mobile computing device and one or more server devices.

Although the description has been described with respect to particularimplementations thereof, these particular implementations are merelyillustrative, and not restrictive. Concepts illustrated in the examplesmay be applied to other examples and implementations. Note that thefunctional blocks, operations, features, methods, devices, and systemsdescribed in the present disclosure may be integrated or divided intodifferent combinations of systems, devices, and functional blocks. Anysuitable programming language and programming techniques may be used toimplement the routines of particular implementations. Differentprogramming techniques may be employed, e.g., procedural orobject-oriented. The routines may execute on a single processing deviceor multiple processors. Although the steps, operations, or computationsmay be presented in a specific order, the order may be changed indifferent particular implementations. In some implementations, multiplesteps or operations shown as sequential in this specification may beperformed at the same time.

1. A computer-implemented method, comprising: receiving an input-output(IO) request from an application; determining an application profile forthe application; based at least in part on the application profile,setting one or more IO parameter values to access a device; allocating abuffer for the application; and accessing the device based on the one ormore IO parameter values to fulfill the IO request, wherein the bufferis used to store application data corresponding to the IO request. 2.The computer-implemented method of claim 1, wherein accessing the devicecomprises: sending a command to the device directly from a softwareapplication executed by the processor without a context switch from thesoftware application to an operating system.
 3. The computing device ofclaim 2, wherein the IO request is to read data from the device, andwherein the IO request specifies a memory address within user spacememory allocated to the software application, and wherein accessing thedevice further comprises: receiving the data from the storage device;and writing the data to the user space memory allocated to the softwareapplication, based on the memory address.
 4. The computer-implementedmethod of claim 1, further comprising, prior to receiving the IOrequest, determining an application type of the application based on aconfiguration setting, and wherein the determining the application typeis performed in response to detecting that the application has launched.5. The computer-implemented method of claim 1, wherein determining theapplication profile is comprises one of: analyzing a plurality of priorIO requests from the application to determine a respective proportion ofcreate, read, update, and delete (CRUD) operations in the plurality ofprior IO requests; or analyzing the plurality of prior IO requests todetermine a proportion of IO requests that result in a cacheinvalidation or a cache miss.
 6. The computer-implemented method ofclaim 1, further comprising identifying the device based on a type ofoperation specified in the IO request.
 7. The computer-implementedmethod of claim 1, wherein the IO request includes an update operation,wherein the update operation comprises: determining whether aread-before-write paradigm is to be utilized; if it is determined thatthe read-before-write paradigm is to be utilized, reading a currentvalue from the device prior to writing a new value; and if it isdetermined that the read-before-write paradigm is not to be utilized,overwriting the current value on the device with the new value.
 8. Anon-transitory computer-readable medium with instructions stored thereonthat, when executed by one or more processors, cause the one or moreprocessors to perform operations comprising: receiving an input-output(IO) request from an application; determining an application profile forthe application; based at least in part on the application profile,setting one or more IO parameter values to access a device; allocating abuffer for the application; and accessing the device based on the one ormore IO parameter values to fulfill the request, wherein the buffer isused to store application data corresponding to the IO request.
 9. Thenon-transitory computer-readable medium of claim 8, wherein theoperations further comprise, prior to receiving the IO request,determining an application type of the application based on aconfiguration setting, and wherein the determining the application typeis performed in response to detecting that the application has launched.10. The non-transitory computer-readable medium of claim 8, whereindetermining the application profile is based on at least in part on aplurality of prior IO requests from the application.
 11. Thenon-transitory computer-readable medium of claim 10, wherein theoperations further comprise analyzing the plurality of prior IO requeststo determine a respective proportion of create, read, update, and delete(CRUD) operations in the plurality of prior IO requests
 12. Thenon-transitory computer-readable medium of claim 10, wherein theoperations further comprise analyzing the plurality of prior IO requeststo determine a proportion of IO requests that result in a cacheinvalidation or a cache miss.
 13. A computing device comprising: aprocessor; a storage device coupled to the processor; and a memorycoupled to the processor with instructions stored thereon that, whenexecuted by the processor, cause the processor to perform operationscomprising: receiving an input-output (IO) request from an application;determining an application profile for the application; based at leastin part on the application profile, setting one or more IO parametervalues to access the storage device; allocating a buffer for theapplication; and accessing the storage device based on the one or moreIO parameter values to fulfill the request, wherein the buffer is usedto store application data corresponding to the IO request.
 14. Thecomputing device of claim 13, wherein accessing the storage devicecomprises: sending a command to the storage device directly from asoftware application executed by the processor without a context switchfrom the software application to an operating system.
 15. The computingdevice of claim 14, wherein the IO request is to write data thatcomprises one or more data units, and the command specifies a respectivephysical address within one or more individual storage units of thestorage device for the one or more data units, and wherein accessing thestorage device further comprises sending the data to the storage device.16. The computing device of claim 14, wherein the IO request is to writedata to the storage device and includes a pointer to a memory addresswithin user space memory allocated to the software application, andwherein sending the command comprises: reading the data directly fromthe user space memory based on the pointer; and sending the data to thestorage device.
 17. The computing device of claim 13, wherein the IOrequest is to read data from the storage device, and wherein the IOrequest specifies a memory address within user space memory allocated tothe software application.
 18. The computing device of claim 17, whereinaccessing the storage device further comprises: receiving the data fromthe storage device; and writing the data to the user space memoryallocated to the software application, based on the memory address. 19.The computing device of claim 13, wherein the operations furthercomprise, prior to receiving the IO request, determining an applicationtype of the application based on a configuration setting, and whereinthe determining the application type is performed in response todetecting that the application has launched.
 20. The computing device ofclaim 13, wherein determining the application profile is based at leastin part on a plurality of prior IO requests from the application. 21.The computing device of claim 20, wherein the operations furthercomprise at least one of: analyzing the plurality of prior IO requeststo determine a respective proportion of create, read, update, and delete(CRUD) operations in the plurality of prior IO requests; or analyzingthe plurality of prior IO requests to determine a proportion of IOrequests that result in a cache invalidation or a cache miss.
 22. Thecomputing device of claim 20, wherein the operations further comprise:analyzing a size in bits of the plurality of prior IO requests, whereinthe size in bits is of data accessed during the prior IO requests; andbased on the size of the plurality of prior IO requests, determining abandwidth used by the application.